Improvement in performance of Chip-multiprocessor using Effective Dynamic Cache Compression Scheme

نویسنده

  • Poonam Aswani
چکیده

Abstract— Chip Multiprocessors (CMPs) combine multiple cores on a single die, typically with private level-one caches and a shared level-two cache. The gap between processor and memory speed is alleviated primarily by using caches. However, the increasing number of cores on a single chip increases the demand on a critical resource: the shared L2 cache capacity. In this dissertation work , a lossless compression algorithm is introduced for fast data compression and ultimately CMP performance. Cache compression stores compressed lines in the cache, potentially increasing the effective cache size, reducing off-chip misses and improving performance. On the downside, decompression overhead can slow down cache hit latencies, possibly degrading performance. While compression can have a positive impact on CMP performance, practical implementations of compression raise a few concerns: Compression algorithms have high overhead to implement at the cache level. Decompression overhead can degrade performance . Generally compression algorithm are not effective in compressing small blocks. Hardware modification is required. In this dissertation work , we make contributions that address the above concerns. We propose a compressed L2 cache design based on an effective compression algorithm with a low decompression overhead. We developed dynamic cache compression scheme that dynamically adapts to the costs and benefits of cache compression, and employs compression only when it will enhance the performance. We show that cache compression improve CMP performance for different workloads.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Base-Delta-Immediate Compression: A Practical Data Compression Mechanism for On-Chip Caches

Cache compression is a promising technique to increase cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need for a simple...

متن کامل

A Fast Dynamic Compression Scheme for Low-Latency On-Chip Address Buses

As implementation technology scales down, interconnects become a major impediment to improving performance and reducing cost. In this paper, we present a new compression scheme, called partial match compression, to compress addresses dynamically and also analyze how area slack arising from compression can be exploited for bus latency improvement by increasing inter-wire spacing. Our results sho...

متن کامل

A Single Chip Multiprocessor Integrated with High Density DRAM

A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing memory latency and improving memory bandwidth. In this paper we evaluate the performance of a single chip multiprocessor integrated with DRAM when the DRAM is organized as on-chip main memory and as on-chip cache. We compare the performance of this architecture with that of a more c...

متن کامل

Near Fine Grain Parallel Processing Using Static Scheduling on Single Chip Multiprocessors

With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting important problems. However, it has been thought that popular superscalar and VLIW would have difficulty to obtain scalable improvement of effective performance in future because of the limitation of instruction level para...

متن کامل

Code Compression Algorithm for High Performance Micro Processor

Modern processors use two or more levels of cache memories to bridge the rising disparity between processor and memory speeds. Microprocessor designers have been torn between tight constraints on the amount of on-chip cache memory and the high latency of off-chip memory, such as dynamic random access memory. Accessing off-chip memory generally takes an order of magnitude more time than accessin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014